Quantifying the impact of remote work on the local labour market in Sweden

A description of Swedish property data

Author

Jonathan Jayes

Published

April 17, 2024

Code
library(tidyverse)

df <- read_csv(here::here("temp/hemnet_properties_for_readme.csv"))

Hello everyone - I hope that you are having fun in Gothenburg! I am sorry that I could not be there to join you - but I am sure that I am having a blast in Oslo at this time.

Kerstin spoke to me briefly about using at modern register data to look at the effect of remote work on location choices in Sweden. For example, we could look at how many people now live further away from the city center, or from where they work, as a result of remote work becoming more prevalent.

It is an elegantly simple idea, and if the data is available, I think would make a neat paper.

There are a number of additions to this - I feel like looking at the JobTech data on the number of roles advertising remote work in the job description would be beneficial.

In addition, we could look at property sales in different areas, and see if there is a correlation between the number of remote work roles and property prices.

Nick Bloom and some colleagues talk about the prevalence of work from home in the US and note that it is still high, even after the pandemic.

They show that it is highest in areas of high population density:

Work from home share by population density in the US

They also find that it is higher for individuals

Property sales data

For my own interest, I have a dataset of property sales in Malmö that could be expanded to the whole country for this purpose.1

Currently I scrape the records of property sales from Hemnet for Malmö each week and store them in a database on GitHub.

The dataset contains information on the final sale price, property type, location, and other features of the property.

Basic time series

Code
df %>% 
filter(type == "Lägenhet") %>%
ggplot(aes(x = sale_date, y = price_per_square_meter)) +
geom_point(alpha = .1) +
geom_smooth(group = 1) +
scale_y_continuous(labels = scales::dollar_format(prefix = "SEK ")) +
labs(title = "Price per square meter over time in Malmö",
     x = "Sale date",
     y = "Price per square meter")

Which areas have the highest price per square meter, on average?

Code
library(ggridges)

# make a ridge plot showing price_per_square_meter by location
df %>%
    mutate(location = fct_lump(location, 10)) %>%
    # filter(location != "Other") %>%
    mutate(location = str_remove(location, ", Malmö kommun")) %>%
    filter(!is.na(location)) %>%
    mutate(location = fct_reorder(location, price_per_square_meter, .fun = median)) %>%
    ggplot(aes(x = price_per_square_meter, y = location, fill = location)) +
    stat_binline(geom = "density_ridges") +
    theme_minimal() +
    theme(legend.position = "none") +
    scale_x_continuous(labels = scales::dollar_format(prefix = "SEK ")) +
    labs(
        title = "Price per square meter by location in Malmö",
        x = "Price per square meter",
        y = "Location"
    )

Mapping

We can show the types of properties in Malmö on a map.

You can see all the information about each property by clicking on the marker.

Code
library(mapview)
library(sf)
library(RColorBrewer)


malmo_bbox = c(left = 12.9, bottom = 55.53, right = 13.1, top = 55.65)

df_filtered <- df %>%
    filter(between(latitude, 55.53, 55.65), between(longitude, 12.9, 13.1)) %>% 
    slice_sample(n = 1000)

df_sf <- st_as_sf(df_filtered, coords = c("longitude", "latitude"), crs = 4326)
Code
mapview(df_sf, zcol = "type")

We can also show the price per square meter on a map.

Code
mapview(df_sf, zcol = "price_per_square_meter", col.regions=brewer.pal(9, "YlOrRd"))

Data summary

Code
df %>% skimr::skim()
Data summary
Name Piped data
Number of rows 45988
Number of columns 27
_______________________
Column type frequency:
character 8
numeric 18
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
title 0 1.00 6 54 0 13428 0
type 538 0.99 4 22 0 12 0
location 538 0.99 12 60 0 1256 0
agent_name 15123 0.67 4 27 0 426 0
agent_link 15123 0.67 61 107 0 449 0
price_development 0 1.00 11 24 0 4066 0
ownership_form 0 1.00 5 23 0 7 0
housing_association 30647 0.33 4 56 0 2504 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
final_price 0 1.00 2053857.07 1200883.74 1.00 1200000.00 1750000.00 2600000.00 1.7700e+07 ▇▁▁▁▁
price_per_square_meter 2229 0.95 30343.74 12231.99 0.00 20769.00 29091.00 38500.00 9.5652e+04 ▃▇▃▁▁
starting_price 66 1.00 1991311.80 1176311.87 13455.00 1152500.00 1695000.00 2495000.00 2.2000e+07 ▇▁▁▁▁
number_of_rooms 52 1.00 2.48 0.96 1.00 2.00 2.00 3.00 8.0000e+00 ▇▅▂▁▁
living_area 0 1.00 67.28 23.10 19.00 51.00 64.00 81.50 2.2600e+02 ▇▇▁▁▁
balcony 27134 0.41 0.75 0.43 0.00 1.00 1.00 1.00 1.0000e+00 ▂▁▁▁▇
year_of_construction 3132 0.93 1940.92 227.45 0.00 1938.00 1955.00 1973.00 1.9556e+04 ▇▁▁▁▁
fee 2220 0.95 3858.72 1232.81 0.00 2990.00 3767.00 4598.00 1.3086e+04 ▂▇▂▁▁
operational_cost 21932 0.52 7008.66 7768.70 12.00 3600.00 4800.00 6600.00 8.0500e+04 ▇▁▁▁▁
leasehold_fee 45808 0.00 7681.44 3008.51 1885.00 5329.50 6992.50 10380.50 1.8460e+04 ▂▇▅▂▁
sale_year 538 0.99 2019.22 2.70 2013.00 2017.00 2020.00 2021.00 2.0240e+03 ▃▆▆▇▇
sale_month 538 0.99 6.35 3.33 1.00 3.00 6.00 9.00 1.2000e+01 ▇▆▅▆▇
sale_day 538 0.99 16.00 8.63 1.00 9.00 16.00 23.00 3.1000e+01 ▇▇▇▇▇
floor_number 27670 0.40 3.20 2.11 0.00 2.00 3.00 4.00 1.9000e+01 ▇▃▁▁▁
top_floor_number 28478 0.38 5.36 2.60 1.00 4.00 5.00 6.00 2.0000e+01 ▇▇▁▁▁
elevator_presence 28293 0.38 0.59 0.49 0.00 0.00 1.00 1.00 1.0000e+00 ▆▁▁▁▇
latitude 0 1.00 55.59 0.04 55.51 55.58 55.59 55.60 5.9290e+01 ▇▁▁▁▁
longitude 0 1.00 13.00 0.06 12.90 12.98 13.01 13.02 1.8120e+01 ▇▁▁▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
sale_date 538 0.99 2013-01-15 2024-04-04 2020-01-07 3891

Data dictionary

Code
# Define column names
colnames <- df %>% colnames()

# Define descriptions (replace with actual descriptions)
descriptions <- c(
    "Final sale price (not original asking price)",
    "Title of advert (property address)",
    "Property type (e.g. apartment, house)",
    "Suburb or area",
    "Date of Sale",
    "Name of estate agent",
    "Link to estate agent website",
    "Price per square meter",
    "Asking price",
    "Difference between asking price and final price",
    "Ownership type (e.g. Äganderätt, Bostadsrätt)",
    "Number of rooms",
    "Living area in square meters",
    "Balcony (1/0/NA)",
    "Year of building construction",
    "Avgift (monthly fee)",
    "Driftskostnad (operating cost)",
    "Leasehold fee",
    "Föreningen (housing association)",
    "Year of sale",
    "Month of sale",
    "Day of sale",
    "Floor number (usually for apartments)",
    "Number of floors in building",
    "Elevator (1/0/NA)",
    "Latitude",
    "Longitude"
)

# Create data dictionary
data_dictionary <- tibble::tibble(
    Column = colnames,
    Description = descriptions
)

# Print data dictionary
library(gt)

data_dictionary %>% 
    gt::gt() %>%
    gt::fmt_number(columns = 1, use_seps = FALSE) %>% 
    gt::tab_header(
        title = md("**Data dictionary**"),
        subtitle = "Description of columns in the dataset")
Data dictionary
Description of columns in the dataset
Column Description
final_price Final sale price (not original asking price)
title Title of advert (property address)
type Property type (e.g. apartment, house)
location Suburb or area
sale_date Date of Sale
agent_name Name of estate agent
agent_link Link to estate agent website
price_per_square_meter Price per square meter
starting_price Asking price
price_development Difference between asking price and final price
ownership_form Ownership type (e.g. Äganderätt, Bostadsrätt)
number_of_rooms Number of rooms
living_area Living area in square meters
balcony Balcony (1/0/NA)
year_of_construction Year of building construction
fee Avgift (monthly fee)
operational_cost Driftskostnad (operating cost)
leasehold_fee Leasehold fee
housing_association Föreningen (housing association)
sale_year Year of sale
sale_month Month of sale
sale_day Day of sale
floor_number Floor number (usually for apartments)
top_floor_number Number of floors in building
elevator_presence Elevator (1/0/NA)
latitude Latitude
longitude Longitude

Descriptive statistics

Code
df %>%
    count(type, sort = T) %>%
    gt() %>%
    tab_header(
        title = md("**Property type**"),
        subtitle = "Number of properties by type"
    ) %>%
    cols_label(type = "Property type", n = "Number of properties")
Property type
Number of properties by type
Property type Number of properties
Lägenhet 42723
Villa 937
NA 538
Radhus 470
Parhus 383
Fritidshus 372
Par-/kedje-/radhus 221
Fritidsboende 182
Kedjehus 131
Övrig 22
Tomt 4
Vinterbonat fritidshus 4
Gård/skog 1

Footnotes

  1. I am scraping this data because I would like to buy an apartment in Malmö in the future.↩︎